Analysis device, analysis method, and analysis program

ABSTRACT

A classification unit classifies messages included in a text log depending on types, and gives an ID set for each type to each of the classified messages. A creation unit creates, based on dates of occurrence attached to the messages, a matrix indicating an appearance distribution of the messages in the text log for each predetermined duration for each ID. A pattern extraction unit extracts a plurality of patterns, which are combinations of the IDs, from the matrix created by the creation unit. A removal unit removes a part or whole of the patterns from the matrix. A determination unit calculates a degree of importance for each element included in each of the patterns, and determines whether the degree of importance is equal to or higher than a predetermined threshold. A sequence extraction unit extracts a sequence.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a divisional application of U.S. patentapplication Ser. No. 16/499,870, filed Oct. 1, 2019, currently pending,which is the U.S. national stage of PCT International Patent ApplicationNo. PCT/JP2018/013937, filed Mar. 30, 2018, which claims priority toJapanese Patent Application Nos. 2017-074052, 2017, 2017-074053,2017-074054, and 2017-074055, filed on Apr. 3, 2017. The entiredisclosures of all of the foregoing are hereby incorporated by referenceinto the present disclosure.

FIELD

The present invention relates to an analysis device, an analysis method,and an analysis program.

BACKGROUND

System monitoring using text logs such as syslog and managementinformation base (MIB) information has been conventionally performed foranomaly detection and state analysis in server systems and networksystems. For example, when a fault has occurred in a system, text logsare manually searched by a particular keyword, and a message includingthe keyword is extracted as a critical message.

CITATION LIST Non Patent Literature

-   Non Patent Literature 1: Kenji Yamanishi, “Anomaly detection with    data mining”, 2009, Kyoritsu Shuppan-   Non Patent Literature 2: Hiroshi Sawada, “Nonnegative Matrix    Factorization and Its Applications to Data/Signal Analysis”, The    journal of the Institute of Electronics, Information and    Communication Engineers, Vol. 95 No. 9, pp. 829-833, September, 2012-   Non Patent Literature 3: Tatsuaki Kimura, et al. “Spatio-temporal    factorization of log data for understanding network events.” IEEE    INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 2014.

SUMMARY Technical Problem

The conventional system monitoring using text logs, however, has aproblem in that it is difficult to efficiently analyze a massive amountof text logs to obtain useful information. For example, when the numberof types and amount of text logs are massive due to scaling andcomplication of system configurations, it is difficult to manuallyperform efficient analysis. When a critical message is to be extractedby searching with a particular keyword, useful information included in amessage not extracted may be overlooked.

Solution to Problem

To solve the problems as described above, an analysis device, includes:a classification unit configured to classify messages included in a textlog output from a system for each type, and give an ID set for each typeto each of the classified messages; a creation unit configured tocreate, based on dates of occurrence attached to the messages, a matrixindicating an appearance distribution of the messages in the text logfor each predetermined duration for each ID; a pattern extraction unitconfigured to extract a plurality of patterns, which are combinations ofthe IDs, from the matrix created by the creation unit;

a removal unit configured to remove a part or whole of the patterns fromthe matrix; a determination unit configured to calculate a degree ofimportance for each element included in each of the patterns, anddetermines whether the degree of importance is equal to or higher than apredetermined threshold; and an information extraction unit configuredto extract, from the text log, predetermined information on an elementwhose degree of importance has been determined by the determination unitto be equal to or higher than the predetermined threshold.

To solve the problems as described above, an analysis method to beexecuted by an analysis device, the analysis method includes: a step ofclassifying messages included in a text log output from a system foreach type, and giving an ID set for each type to each of the classifiedmessages; a step of creating, based on dates of occurrence attached tothe messages, a matrix indicating an appearance distribution of themessages in the text log for each predetermined duration for each ID; astep of extracting a plurality of patterns, which are combinations ofthe IDs, from the matrix created at the step of creating; a step ofremoving a part or whole of the patterns from the matrix; a step ofcalculating a degree of importance for each element included in each ofthe patterns, and determining whether the degree of importance is equalto or higher than a predetermined threshold; and

a step of extracting, from the text log, predetermined information on anelement whose degree of importance has been determined at the step ofdetermining to be equal to or higher than the predetermined threshold.

Advantageous Effects of Invention

According to the present invention, a massive amount of text logs can beefficiently analyzed to obtain useful information.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram illustrating an example of a configuration of ananalysis device according to a first embodiment.

FIG. 2 is a diagram illustrating an example of text logs according tothe first embodiment.

FIG. 3 is a diagram illustrating an example of a data configuration ofdictionary information according to the first embodiment.

FIG. 4 is a diagram for describing creation of templates according tothe first embodiment.

FIG. 5 is a diagram illustrating an example of a classified text logaccording to the first embodiment.

FIG. 6 is a diagram illustrating an example of a log matrix according tothe first embodiment.

FIG. 7 is a diagram illustrating an example of an image visualizing textlogs in one month as a log matrix with a duration of one hour accordingto the first embodiment.

FIG. 8 is a diagram for describing decomposition of a log matrixaccording to the first embodiment.

FIG. 9 is a diagram illustrating an example of an image visualizing abasis matrix according to the first embodiment.

FIG. 10 is a diagram illustrating an example of an image visualizing aweighting matrix according to the first embodiment.

FIG. 11 is a diagram for describing removal of frequent patternsaccording to the first embodiment.

FIG. 12 is a diagram for describing removal of frequent patternsaccording to the first embodiment.

FIG. 13 is a diagram illustrating an example of an image visualizing abasis matrix from which frequent patterns have been removed according tothe first embodiment.

FIG. 14 is a diagram illustrating an example of an image visualizing aweighting matrix from which frequent patterns have been removedaccording to the first embodiment.

FIG. 15 is a diagram for describing profiling of text logs according tothe first embodiment.

FIG. 16 is a diagram for describing extraction of sequences according tothe first embodiment.

FIG. 17 is a flowchart illustrating the flow of processing by theanalysis device according to the embodiment.

FIG. 18 is a diagram illustrating an example of a data configuration ofdictionary information according to other embodiments.

FIG. 19 is a diagram illustrating an example of a computer on which ananalysis device is implemented by executing a computer program.

DESCRIPTION OF EMBODIMENTS

An analysis device, an analysis method, and an analysis programaccording to embodiments of the present application are described indetail below with reference to the drawings. The present invention isnot limited by the embodiments described below.

Configuration in First Embodiment

First, a configuration of an analysis device according to a firstembodiment is described with reference to FIG. 1. FIG. 1 is a diagramillustrating an example of the configuration of the analysis deviceaccording to the first embodiment. As illustrated in FIG. 1, an analysisdevice 10 includes a communication unit 11, an input unit 12, an outputunit 13, a storage unit 14, and a control unit 15.

The communication unit 11 communicates data with other devices through anetwork. For example, the communication unit 11 is a network interfacecard (NIC). The input unit 12 receives inputs of data from a user. Forexample, the input unit 12 is an input device such as a mouse and akeyboard. The output unit 13 outputs data by display on a screen. Forexample, the output unit 13 is a display device such as a display.

The storage unit 14 is a storage device such as a hard disk drive (HDD),a solid state drive (SSD), and an optical disc. Note that the storageunit 14 may be a data rewritable semiconductor memory such as a randomaccess memory (RAM), a flash memory, and a non-volatile static randomaccess memory (NVSRAM). The storage unit 14 stores therein an operatingsystem (OS) and various kinds of computer programs to be executed by theanalysis device 10. The storage unit 14 further stores therein variouskinds of information used for the execution of computer programs. Thestorage unit 14 stores output log information 141 and dictionaryinformation 142 therein.

The storage unit 14 stores text logs output from a system as the outputlog information 141. For example, text logs are output from a servermachine, a personal computer, and a storage constituting a computersystem. For example, text logs are output from a router, a firewall, aload balancer, an optical transmission device, and an opticaltransmission relay constituting a network system. Output text logs mayrelate to the overall system, and may relate to a device constituting asystem. Furthermore, text logs may be output from environments where acomputer system and a network system are virtualized.

Text logs may be output from a plant, a generator, and a machine tool.Text logs may be output from vehicle devices such as a vehicle, anairplane, and a train. Text logs may be output from compact electronicdevices such as home electrical appliances, mobile phones, andsmartphones. Text logs may be output from sensor devices for measuringbiological bodies such as humans and animals and biological informationon the biological bodies.

Examples of the text logs include OS syslog, execution logs ofapplications and databases, error logs, operation logs, MIB informationobtained from network devices, alerts of a monitoring system, activitylogs, and operating state logs.

FIG. 2 is a diagram illustrating an example of a text log according tothe first embodiment. As illustrated in FIG. 2, each record of a textlog 51 includes a message and the date of occurrence attached to themessage. For example, the first record of the text log 51 includes amessage “LINK-UP Gigabitethernet 0/0/0” and the date of occurrence“2016/12/01T15:01:31”. Note that the message may include additionalinformation such as a host and a log level.

The storage unit 14 stores data for classifying messages of the text logas the dictionary information 142. FIG. 3 is a diagram illustrating anexample of a data configuration of the dictionary information accordingto the first embodiment. As illustrated in FIG. 3, the dictionaryinformation 142 includes IDs and templates. The ID is information foridentifying the type in which a message of the text log is classified.The template is a character string used to classify a message of thetext log. For example, whether a message is classified to a type with anID “601” is determined by using a template “LINK-UP Interface *”. Notethat the classification of messages using the dictionary information 142is performed by a classification unit 151. Specific processing by theclassification unit 151 is described later.

The control unit 15 controls the overall analysis device 10. Forexample, the control unit 15 is an electronic circuit such as a centralprocessing unit (CPU) and a micro processing unit (MPU) or an integratedcircuit such as an application specific integrated circuit (ASIC) and afield programmable gate array (FPGA). The control unit 15 has aninternal memory for storing therein computer programs and control datadefining various processing procedure, and executes processing by usingthe internal memory. When various computer programs operate, the controlunit 15 functions as various processing units. For example, the controlunit 15 includes the classification unit 151, a creation unit 152, apattern extraction unit 153, a removal unit 154, a determination unit155, a significant log extraction unit 156, and a sequence extractionunit 157.

The classification unit 151 classifies messages included in a text logoutput from a system for each type, and gives an ID set for each type toeach of the classified messages. Each of the classified messages ishereinafter referred to as “template”. As described above, theclassification unit 151 performs the classification by using thedictionary information 142.

The dictionary information 142 may be created by the classification unit151. For example, the classification unit 151 can create a templatebased on a character string obtained by deleting a parameter from amessage. Referring to FIG. 4, a method of creating a template based on atext log is described. FIG. 4 is a diagram for describing the creationof a template according to the first embodiment.

As illustrated in FIG. 4, for example, the classification unit 151 canregard a character string expressed by “numeral/numeral/numeral” as aparameter, and use a character string obtained by deleting the parameterfrom a message as a template. In this case, the classification unit 151creates a template “LINK-UP Interface” from a message “LINK-UP Interface1/0/17” or a message “LINK-UP Interface 0/0/0” included in the text log51 in FIG. 2. Furthermore, the classification unit 151 sets an ID to thecreated template, and adds the template to the dictionary information142 in the storage unit 14. In this case, the classification unit 151may add a wild card such as “*” to a part from which the parameter hasbeen deleted.

Note that the character string regarded as a parameter by theclassification unit 151 is not limited to the above-mentioned example.For example, the classification unit 151 may regard all numerals asparameters, and may regard a character string indicating an address as aparameter. When a message that does not match any template is foundduring the classification of messages, the classification unit 151 maycreate a new template based on the message.

In the first embodiment, templates in the dictionary information 142 arenot necessarily required to be created by the classification unit 151,and may be created by a user in advance or may be automatically createdby a device other than the analysis device 10.

The classification unit 151 classifies a message in a text log bydetermining whether a template in the dictionary information 142 matchesthe message. In this case, the classification unit 151 determines that atemplate matches a message when the template exactly matches the messageor partially matches the message. The classification unit 151 gives themessage an ID of the template determined to match.

For example, a template “LINK-UP Gigabitethernet *” with an ID “602” inFIG. 3 partially matches a message “LINK-UP Gigabitethernet 0/0/0” inthe first line of the text log 51 in FIG. 2, and hence theclassification unit 151 determines that the template “LINK-UPGigabitethernet *” matches the message “LINK-UP Gigabitethernet 0/0/0”,and gives the ID “602” to the message “LINK-UP Gigabitethernet 0/0/0”.

For example, a template “network_monitor:[INFO]:network monitordetection started.” with an ID “701” in FIG. 3 exactly matches a message“network_monitor:[INFO]:network monitor detection started.” in thesecond line of the text log 51 in FIG. 2, and hence the classificationunit 151 determines that the template “network_monitor:[INFO]:networkmonitor detection started.” matches the message“network_monitor:[INFO]:network monitor detection started.”, and givesthe ID “701” to the message “network_monitor:[INFO]:network monitordetection started.”.

The creation of templates and the determination of matching withmessages are not limited to the above-mentioned methods, and may beperformed by a machine learning algorithm such as clustering. Forexample, the above-mentioned determination may be performed by a knownmethod regarding log clustering (Reference document 1: Japanese PatentApplication Laid-open No. 2015-36891).

The classification unit 151 classifies messages and gives IDs to createa classified text log 52. FIG. 5 is a diagram illustrating an example ofa classified text log according to the first embodiment. As illustratedin FIG. 5, each record in the classified text log 52 includes an ID andthe date of occurrence of a message. For example, a record in the firstline of the classified text log 52 includes an ID “602” given to themessage “LINK-UP Gigabitethernet 0/0/0” and the date of occurrence“2016/12/01T15:01:31”. For example, a record in the second line of theclassified text log 52 includes an ID “701” given to the message“network_monitor:[INFO]:network monitor detection started.” and the dateof occurrence “2016/12/01T15:01:53”.

The message “LINK-UP Gigabitethernet 0/0/0” in the first line and themessage “LINK-UP Gigabitethernet 0/2/5” in the fourth line in FIG. 2 aredifferent from each other, but both the messages match the template“LINK-UP Gigabitethernet *”, and hence the classification unit 151 givesthe ID “602” to both the messages.

Based on the dates of occurrence attached to the messages, the creationunit 152 creates a log matrix that is a matrix indicating an appearancedistribution of the messages in the text log 51 for each predeterminedduration for each ID. Specific examples of the appearance distributioninclude, but not limited to, the frequency of appearance of each ID,values processed by taking the logarithm of the frequency of appearance,and the presence/absence of appearance (value of 1 for appearance and 0for non-appearance). In the following embodiment, an example in whichthe frequency of appearance is used as the appearance distribution isdescribed. The creation unit 152 acquires the dates of occurrence andIDs of messages used to create a log matrix from the classified text log52.

FIG. 6 is a diagram illustrating an example of a log matrix according tothe first embodiment. Each row in the log matrix corresponds to an ID inthe dictionary information 142. Each column in the log matrixcorresponds to predetermined date and time of 10 minutes. In thefollowing description, the rows on the upper part of a log matrix Yillustrated in FIG. 6 are 601st to 604th rows, and the rows on the lowerpart are 701st to 706th rows. An element in the p row and the q columnin the log matrix Y is represented by an element (p,q). For example, thevalue of an element (601,1) in the log matrix Y is “2”.

The 601st row in the log matrix Y corresponds to the ID “601”. The 602ndrow in the log matrix Y corresponds to the ID “602”. The 701st row inthe log matrix Y corresponds to the ID “701”. The 702nd row in the logmatrix Y corresponds to the ID “702”. The first column in the log matrixY corresponds to 10 minutes between 2016/12/01T15:00:01 and2016/12/01T15:10:00. The second column in the log matrix Y correspondsto 10 minutes between 2016/12/01T15:10:01 and 2016/12/01T15:20:00. InFIG. 5, all IDs and dates of occurrence of messages with the dates ofoccurrence ranging from 2016/12/01T15:00:01 to 2016/12/01T15:10:00 areillustrated.

In the log matrix, the row number and the ID are not necessarilyrequired to match each other as long as it can be grasped which IDcorresponds to each row. In the log matrix, IDs and times need not beconsecutive, and there may be a missing ID or time.

The creation unit 152 counts the number of occurrences of messages every10 minutes for each ID, and sets the counted number to the value of eachelement in the log matrix Y. For example, as illustrated in FIG. 5, thenumber of messages with the ID “601” is two among the messages with thedates of occurrence ranging from 2016/12/01T15:00:01 to2016/12/01T15:10:00. Thus, as illustrated in FIG. 6, the creation unit152 sets the value of the element (601,1) in the log matrix Y to “2”.For example, as illustrated in FIG. 5, the number of messages with theID “701” is one among the messages with the dates of occurrence rangingfrom 2016/12/01T15:00:01 to 2016/12/01T15:10:00. Thus, as illustrated inFIG. 6, the creation unit 152 sets the value of the element (701,1) inthe log matrix Y to “1”. Similarly, the creation unit 152 counts thenumbers of occurrences of messages for each ID and each duration, andsets the value of each element in the log matrix Y.

The output unit 13 outputs an image visualizing the log matrix Y. FIG. 7is a diagram illustrating an example of an image visualizing a text logfor one month as a log matrix having a duration of one hour according tothe first embodiment. The vertical axis in FIG. 7 represents IDs. Thehorizontal axis in FIG. 7 represents time. In FIG. 7, an element havinga higher appearance distribution of messages is indicated by a darkercolor.

The log matrix Y may be created based on a plurality of text logs withdifferent properties. For example, the log matrix Y may be created basedon a text log output in accordance with syslog and a text log outputfrom an OpenStack system. Even in such cases, the output unit 13 canvisualize the log matrix Y as one image as illustrated in FIG. 7.

In a region 531 in FIG. 7, a text log is output every day, and hence itis understood that an ID group based on regular processing such as dailyprocessing appears. Similarly, in a region 532, an ID group based onprocessing constantly executed appears. In a region 533, a text log isoutput for five days excluding two days, and hence it is understood thatan ID group based on processing related to business tasks on weekdaysappears. On the other hand, in this system, a system failure occurred onthe second day, and an ID group seems to have been output due to thefailure appears in a region 534. Thus, it is important to extractparticularly the ID group appearing in the region 534 for anomalydetection.

The pattern extraction unit 153 extracts a combination of IDs as apattern from the log matrix created by the creation unit 152.Specifically, the pattern extraction unit 153 decomposes the log matrixby nonnegative matrix factorization (NMF), and extracts, as patterns, abasis matrix having a pattern as a combination of IDs in a column vectorand a weighting matrix having a row vector indicating how frequently thepattern appears for each predetermined duration. It is known that thematrix decomposition by NMF has characteristics that a pattern appearingrelatively frequently is extracted. Note that the method of matrixdecomposition is not limited to only NMF, and methods such as principalcomponent analysis and independent component analysis can be used.

Note that, in the case where a matrix includes only non-negative values,methods such as non-negative principal component analysis andnon-negative independent component analysis can be used as the method ofmatrix decomposition.

First, the decomposition of a log matrix is described with reference toFIG. 8. FIG. 8 is a diagram for describing the decomposition of a logmatrix according to the first embodiment. The pattern extraction unit153 uses NMF to decompose a log matrix Y into a basis matrix H and aweighting matrix U. For example, the pattern extraction unit 153 canperform NMF by the method disclosed in Non Patent Literature 2. Notethat the log matrix Y, the basis matrix H, and the weighting matrix U inthe first embodiment correspond to a matrix X, a matrix T, and a matrixV described in Non Patent Literature 2, respectively. The relation ofthe log matrix Y, the basis matrix H, and the weighting matrix U can beexpressed by Equation (1) where E is an error matrix.

Y=HU+E  (1)

Each row in the basis matrix H corresponds to each row in the log matrixY, that is, an ID. Each column in the weighting matrix U corresponds toeach column in the log matrix Y, that is, a duration. Each column in thebasis matrix H and each row in the weighting matrix U correspond to eachpattern as a combination of IDs. In one example, the number of columnsin the basis matrix H in the first embodiment is 10, and basis numbers 1to 10 are set as combinations of IDs appearing in the first to 10thcolumns in the basis matrix H.

It can be said that the basis matrix H indicate which ID is included andto what degree the ID is included in a pattern as a combination of IDscorresponding to a message group repeatedly appearing concurrently. Itcan be said that the weighting matrix U indicates to what degree and inwhich time zone a pattern as a combination of IDs appearing in eachcolumn of the basis matrix H has occurred.

The output unit 13 can output an image visualizing the basis matrix H asillustrated in FIG. 9. FIG. 9 is a diagram illustrating an example of animage visualizing a basis matrix according to the first embodiment,illustrating a basis matrix obtained by executing NMF on the log matrixin FIG. 7. It is confirmed from FIG. 9 that an ID group constantlyappearing in the log matrix as in the region 532 in FIG. 7 is includedin any of the patterns of the basis numbers 1 to 10. For example, thepattern of the basis number 1 includes the ID group in the region 531 inFIG. 7. The pattern of the basis number 4 includes the ID group in theregion 533. In this manner, the frequently appearing ID groups areextracted as patterns, but a less frequently appearing ID group (such asa failure) as in the region 534 in FIG. 7 is not extracted.

The output unit 13 can output an image visualizing the weighting matrixU as illustrated in FIG. 10. FIG. 10 is a diagram illustrating anexample of an image visualizing a weighting matrix according to thefirst embodiment, illustrating a weighting matrix obtained by executingNMF on the log matrix in FIG. 7. As illustrated in FIG. 10, the patternof the basis number 1 is a pattern appearing every day, which matchesthe confirmation from FIG. 9 that this pattern includes an ID group ofdaily processing. The pattern of the basis number 4 appears five timeswith long intervals after five times of the appearance and appearsweekdays, which matches the confirmation from FIG. 9 that this patternincludes IDs related to weekday business tasks. The patterns appearingweekdays correspond to the basis numbers 6 and 9 in addition to thebasis number 4. On the other hand, it is understood that the patterns ofthe basis numbers 7, 8, and 10 are patterns constantly appearing.

The appearance timings of frequently appearing ID groups, such as IDgroups periodically appearing and ID groups constantly appearing areclearly indicated in FIG. 10. On the other hand, the appearance timingsof ID groups occurring based on anomaly, such as a failure, as in theregion 534 in FIG. 7 are not clearly indicated in FIG. 10. The ID groupincluded in the region 534 in FIG. 7 is not indicated in FIG. 9, and itis understood that the ID group has not been extracted as a pattern.

The removal unit 154 removes frequent patterns from the log matrix.Specifically, the removal unit 154 removes patterns by subtracting, fromthe log matrix, any one of the product of the basis matrix and asignificant weighting matrix obtained by replacing the values ofelements in each pattern in the weighting matrix smaller than apredetermined threshold with 0, the product of a significant basismatrix obtained by replacing the values of elements in each pattern inthe basis matrix smaller than a predetermined threshold with 0 and theweighting matrix, and the product of the significant basis matrix andthe significant weighting matrix. Note that, when an element in the logmatrix becomes minus after the calculation, the element can be replacedwith 0 to be a non-negative value.

For example, the removal unit 154 removes frequent patterns from the logmatrix by Equation (2). Y is a log matrix, H is a basis matrix, U is aweighting matrix, and E is an error matrix. Y is decomposed into HU andE. H_(freq) is a matrix obtained by replacing, for each column in H, thevalue in a row smaller than a predetermined threshold with 0, that is, asignificant basis matrix. U_(freq) is a matrix obtained by replacing,for each row in U, the value in a column smaller than a predeterminedthreshold with 0, that is, a significant weighting matrix. Y′_(e1) is alog matrix from which the frequent patterns have been removed.

$\begin{matrix}{{Y_{e1}^{\prime} = {Y - {H_{e1}U_{e1}}}}{where}} & (2)\end{matrix}$ $H_{e1} = \left\{ {\begin{matrix}H \\H_{freq}\end{matrix},{U_{e1} = \left\{ \begin{matrix}U \\U_{freq}\end{matrix} \right.}} \right.$

As expressed by Equation (2), H_(e1) is H or H_(freq). U_(e1) is U orU_(freq). When H_(e1) is H, U_(e1) is U_(freq), and when U_(e1) is U,H_(e1) is H_(freq). In other words, H_(e1)U_(e1) in Equation (2)includes at least one of the significant basis matrix and thesignificant weighting matrix. In this manner, H_(e1)U_(e1) in Equation(2) is expressed by any one of H_(freq)U, HU_(freq), andH_(freq)U_(freq).

For example, when H_(e1) is H and U_(e1) is U_(freq), the removal unit154 removes frequent patterns by using Equation (3). Note that Y′ is alog matrix from which the frequent patterns have been removed, that is,Y′_(e1).

$\begin{matrix}\begin{matrix}{Y^{\prime} = {Y - {HU}_{freq}}} \\{= {{HU} + E - {HU}_{freq}}} \\{= {{H\left( {U - U_{freq}} \right)} + E}}\end{matrix} & (3)\end{matrix}$

FIGS. 11 and 12 are diagrams for describing the removal of frequentpatterns according to the first embodiment. An example in which a basismatrix and a significant weighting matrix are used to remove frequentpatterns by Equation (3) is described. As illustrated in FIG. 11, first,the removal unit 154 calculates the product HU_(freq) of the basismatrix H and the significant weighting matrix U_(freq) in which thevalues of elements not to be removed are 0.

Next, as illustrated in FIG. 12, the removal unit 154 subtracts, fromthe log matrix Y, the log matrix HU_(freq) for removing frequentpatterns. In this manner, the removal unit 154 obtains a log matrix Y′from which the frequent patterns have been removed. The patternextraction unit 153 further extracts frequent patterns from the logmatrix from which the frequent patterns have been removed by the removalunit 154. In this case, a basis matrix and a weighting matrix extractedfrom the log matrix Y′ by the pattern extraction unit 153 are H′ and U′,respectively.

The output unit 13 can output an image visualizing the basis matrix H′as illustrated in FIG. 13. FIG. 13 is a diagram illustrating an exampleof an image visualizing a basis matrix from which frequent patterns havebeen removed according to the first embodiment, illustrating a basismatrix obtained as a result of performing the second pattern extractionby using the log matrices obtained as FIG. 9 and FIG. 10 from which thepatterns have been removed from the log matrix in FIG. 7. As compared tothe pattern in FIG. 9 in the first pattern extraction, IDs 200 to 500that have not appeared in the first pattern extraction appear in thepattern of the basis number 8. These IDs are an ID group that hasappeared only during the system failure on the second day.

The output unit 13 can output an image visualizing the weighting matrixU′ as illustrated in FIG. 14. FIG. 14 is a diagram illustrating anexample of an image visualizing a weighting matrix from which frequentpatterns have been removed according to the first embodiment,illustrating a weighting matrix obtained as a result of performing thesecond pattern extraction by using the log matrices obtained as FIG. 9and FIG. 10 from which the patterns have been removed from the logmatrix in FIG. 7. As illustrated in FIG. 14, the pattern of the basisnumber 8 including the ID group upon the failure in FIG. 13 appears onthe second day. The pattern of the basis number 5 strongly appearsmainly on the second day, and can be said to be a pattern possiblyrelating to the failure.

While the case where a significant basis matrix or a significantweighting matrix is used to remove frequent patterns has been described,the removal unit 154 may remove frequent patterns by using anon-significant basis matrix or a non-significant weighting matrix.Specifically, the removal unit 154 uses any one of the product of abasis matrix and a non-significant weighting matrix obtained byreplacing the values of elements in each pattern in a weighting matrixequal to or larger than a predetermined threshold with 0, the product ofa non-significant basis matrix obtained by replacing the values ofelements in each pattern in a basis matrix equal to or larger than apredetermined threshold with 0 and a weighting matrix, and the productof the non-significant basis matrix and the non-significant weightingmatrix as a matrix obtained by removing patterns from a log matrix. Notethat, when an element in the log matrix becomes minus after thecalculation, the element can be replaced with 0 to be a non-negativevalue.

Specifically, the removal unit 154 removes frequent patterns from a logmatrix by Equation (4). Similarly to Equation (3), Y is a log matrix, His a basis matrix, U is a weighting matrix, and E is an error matrix. Yis decomposed into HU and E. H_(rare) is a matrix obtained by replacing,for each column in H, the value in a row equal to or larger than apredetermined threshold with 0, that is, a non-significant basis matrix.U_(rare) is a matrix obtained by replacing, for each row in U, the valuein a column equal to or larger than a predetermined threshold with 0,that is, a non-significant weighting matrix. Y′_(e2) is a log matrixfrom which the frequent patterns have been removed.

$\begin{matrix}{{Y_{e2}^{\prime} = {H_{e2}U_{e2}}}{where}} & (4)\end{matrix}$ $H_{e2} = \left\{ {\begin{matrix}H \\H_{rare}\end{matrix},{U_{e2} = \left\{ \begin{matrix}U \\U_{rare}\end{matrix} \right.}} \right.$

As expressed by Equation (4), H_(e2) is H or H_(rare). U_(e2) is U orU_(rare). When H_(e2) is H, U_(e2) is U_(rare), and when U_(e2) is U,H_(e2) is H_(rare). In other words, H_(e2)U_(e2) in Equation (4)includes at least one of the non-significant basis matrix and thenon-significant weighting matrix. In this manner, H_(e2)U₂ in Equation(4) is expressed by any one of H_(rare)U, HU_(rare), andH_(rare)U_(rare).

The removal unit 154 may further remove frequent patterns from the logmatrix Y′. In this case, the pattern extraction unit 153 furtherextracts frequent patterns from the log matrix from which the frequentpatterns have been removed by the removal unit 154. The patternextraction unit 153 and the removal unit 154 may repeat the extractionof frequent patterns and the removal of frequent patterns until apredetermined condition is satisfied.

The determination unit 155 calculates the degree of importance for eachof elements included in the frequent pattern extracted from the logmatrix by the pattern extraction unit 153, and determines whether thedegree of importance is equal to or higher than a predeterminedthreshold. The determination unit 155 may further calculate the degreeof importance for each of elements included in the frequent patternfurther extracted by the pattern extraction unit 153 from the log matrixfrom which the frequent pattern has been removed by the removal unit154, and determine whether the degree of importance is equal to orhigher than a predetermined threshold. Now, an example of the case wherethe determination unit 155 calculates the degree of importance for eachID of elements included in a frequent pattern is described.

To extract IDs characterizing each column in a basis matrix, IDscorresponding to elements having large values can be preferentiallyextracted. However, IDs included in a plurality of columns may beimproper as IDs characterizing each column. For example, there is an IDcontinuing to constantly appear frequently before or after a frequentpattern is removed, the ID may be included in a plurality of columns ina basis matrix. This is easily confirmed from FIG. 9. Even if such an IDis extracted, it is difficult to utilize the ID for anomaly detection.

In view of the above, the determination unit 155 calculates, for each IDincluded in a frequent pattern, the degree of importance so as to behigher as the value of an element for each ID in the pattern becomeshigher and be lower as the number of frequent patterns including the IDbecomes larger, and determines whether the degree of importance is equalto or higher than a predetermined threshold. Examples of thedetermination method based on such a degree of importance include TF-IDF(Reference document 2: “tf-idf”, [online], Wikipedia, [searched on Jan.26, 2017], from ja.wikipedia.org. The determination unit 155 cancalculate the degree of importance by the method based on TF-IDF. Forexample, when the basis matrix H is a matrix D, the determination unit155 uses Equations (5-1) to (5-3) to calculate the degree of importancetfidf (t,d,D) of an ID of t in a column d in the matrix D.

$\begin{matrix}{{{tf}\left( {t,d} \right)} = f_{t,d}} & \left( {5 - 1} \right)\end{matrix}$ $\begin{matrix}{{idf} = {\log\frac{N}{n_{t}}}} & \left( {5 - 2} \right)\end{matrix}$ $\begin{matrix}{{{tfidf}\left( {t,d,D} \right)} = {{{tf}\left( {t,d} \right)} \cdot {{idf}\left( {t,D} \right)}}} & \left( {5 - 3} \right)\end{matrix}$

f_(t,d) is the value of an element with the ID of t in the column d. Nis the number of bases, that is, the number of columns in the basismatrix H. n_(t) is the number of columns in the basis matrix H in eachof which the value of an element of an ID is equal to or larger than apredetermined threshold.

The determination unit 155 may calculate the degree of importance onlyfor a particular ID. The determination unit 155 may calculate a firstthreshold for all element values in each column vector in the basismatrix H by using Otsu's method (Reference document 3: Nobuyuki Otsu: “Athreshold selection method from gray-level histograms,” Automatica11.285-296 (1975), pp. 23-27), and calculate the degree of importanceonly for an ID having the value of the element equal to or larger thanthe first threshold.

The determination unit 155 may calculate a second threshold for allelement values in each row vector by using Otsu's method based on thefrequency of appearance of each pattern for each predetermined duration,that is, a weighting matrix, and determine whether the frequency ofappearance for each predetermined duration is equal to or larger thanthe second threshold.

The determination unit 155 may calculate the degree of importance for atime. Specifically, when the weighting matrix U is a matrix D, thedetermination unit 155 uses Equations (5-1) to (5-3) to calculate thedegree of importance tfidf (t,d,D) of a time of t in a row d in thematrix D.

f_(t,d) is the value of an element with the time of t in the row d. N isthe number of bases, that is, the number of rows in the weighting matrixH. n_(t) is the number of rows in the weighting matrix U in each ofwhich the value of an element of a time is equal to or larger than apredetermined threshold.

The determination unit 155 may calculate the degree of importance onlyfor a particular time. The determination unit 155 may calculate a thirdthreshold for all elements in each row vector in the weighting matrix Uby using Otsu's method, and calculate the degree of importance only fora time having the value of the element equal to or larger than the thirdthreshold.

The above-mentioned calculation of the degree of importance ordetermination using the threshold enables profiling as illustrated inFIG. 15 to be performed. FIG. 15 is a diagram for describing theprofiling of a text log according to the first embodiment.

For example, by performing the determination using the second thresholdfor a weighting matrix, extracting a duration equal to or larger thanthe second threshold as a principal element, and collating the durationwith external information such as a failure occurrence time, the type ofa pattern can be estimated. In this manner, patterns can be classifiedinto a pre-fault pattern, a post-fault pattern, a regular pattern, andan irregular pattern irrelevant to a failure.

By extracting an ID whose degree of importance is equal to or higherthan a predetermined value from a basis matrix and collating eachmessage of the ID having the principal element (that is, template indictionary information 142) with external information such as failuredetails and a failure site, whether the pattern is relevant to a failurecan be estimated. For example, it can be determined whether a patternthat has been classified as a pre-fault pattern from the weightingmatrix is a fault predictive pattern indicating a sign of a fault,whether a pattern that has been classified as a post-fault pattern is afault propagation pattern indicating the influence of a fault, whether apattern that has been classified as a regular pattern is a steadyprocessing pattern indicating normal processing, and a pattern that hasbeen classified as an irregular pattern irrelevant to a fault is aconstruction pattern indicating that a construction was performed.

The significant log extraction unit 156 and the sequence extraction unit157 extract predetermined information on principal elements from theclassified text log 52. Note that the significant log extraction unit156 and the sequence extraction unit 157 are an example of aninformation extraction unit.

The significant log extraction unit 156 generates a significant log byextracting, from the classified text log 52, a record including only IDsthat are principal elements of a basis matrix extracted by thedetermination unit 155. The significant log extraction unit 156 maygenerate a significant log by limiting to, in addition to the ID as aprincipal element, a time corresponding to a duration as a principalelement in the weighting matrix extracted by the determination unit 155.The significant log extraction unit 156 may generate a significant logfor each pattern. The significant log extraction unit 156 may generate asignificant log by converting the date of occurrence into another timeformat such as UNIX (registered trademark) time.

The sequence extraction unit 157 extracts, from the significant loggenerated by the significant log extraction unit 156, a particularsequence among sequences indicating the order of appearance of IDs.

Specifically, the sequence extraction unit 157 counts the number ofappearances of each sequence of IDs included in the significant log, andextracts a sequence having a large number of appearances. The extractionof sequences can be determined by using any method such as theextraction of a sequence having the largest number of appearances, theextraction of the top k sequences having a larger number of appearances,and the extraction of sequences whose number of appearances is equal toor larger than a designated number.

The sequence extraction unit 157 may extract only a sequence that hasbeen determined to satisfy a predetermined condition among sequenceshaving a large number of appearances. Examples of the predeterminedcondition include, but not limited to, the length of the sequence andthe sequence lapse time indicating the lapse time from the first ID tothe last ID of the sequence. Regarding these conditions, the range ofextraction may be limited by determining a threshold similarly to thenumber of appearances.

For example, when the sequence extraction unit 157 extracts “704”,“705”, and “706” as IDs whose degrees of importance calculated by thedetermination unit 155 are equal to or larger than a predeterminedvalue, the significant log extraction unit 156 extracts a recordincluding the IDs of “704”, “705”, and “706” from the classified textlog 52 to generate a significant log. The sequence extraction unit 157further extracts a sequence having a large number of appearances fromthe significant log.

The sequence extraction unit 157 can use algorithms of sequentialpattern mining (Reference document 4: J. Pei et al. “PrefixSpan: MiningSequential Patterns Efficiently by Prefix-Projected Pattern Growth,”Proc. of The 17th Int'l Conf. on Data Engineering, pp. 215-224 (2001)(from idb.csie.ncku.edu.tw) and episode mining (Reference document 5: A.Achar et al. “Pattern-growth based frequent serial episode discovery,”Data and Knowledge Engineering, 87:pp. 91-108 (2013)) as the method forextracting sequences.

FIG. 16 is a diagram for describing the extraction of sequencesaccording to the first embodiment. This example is a sequence extractionexample based on a combination of IDs without repetition. As illustratedin FIG. 16, the number of combinations of sequences of messages with theIDs of “704”, “705”, and “706” is 3!, that is, 6. The sequenceextraction unit 157 counts the number of appearances of each of thesequences. For extracting a sequence having the largest number ofappearances, the sequence extraction unit 157 extracts a sequence“705-704-706” having the number of appearances of 10. A user canestimate the cause of a failure based on the sequence extracted by thesequence extraction unit 157.

Processing in first embodiment Referring to FIG. 17, the flow ofprocessing by the analysis device 10 is described. FIG. 17 is aflowchart illustrating the flow of processing by the analysis deviceaccording to the embodiment. As illustrated in FIG. 17, first, theclassification unit 151 classifies messages of a text log for each type,and gives IDs (Step S101). The creation unit 152 creates a log matrixbased on the dates of occurrence in the text log and the IDs given bythe classification unit 151 (Step S102).

Next, the pattern extraction unit 153 decomposes the log matrix toextract a basis matrix and a weighting matrix (Step S103). For example,the pattern extraction unit 153 decomposes the log matrix by NMF toextract a basis matrix in which a pattern as a combination of IDs is acolumn vector and a weighting matrix in which a column vector indicateshow frequently the pattern has appeared for each predetermined duration.

When it is determined that a predetermined condition is satisfied and apattern needs to be removed (Yes at Step S104), the removal unit 154removes a frequent pattern from the log matrix (Step S105). For example,the removal unit 154 removes a frequent pattern by subtracting, from thelog matrix, the product of the basis matrix and a significant weightingmatrix obtained by replacing the value of each element in the weightingmatrix smaller than a predetermined threshold with 0. Alternatively, theremoval unit 154 removes a frequent pattern by taking the product of thebasis matrix and a non-significant weighting matrix obtained byreplacing the value of each element in the weighting matrix equal to orlarger than a predetermined value with 0. The pattern extraction unit153 further extracts a pattern (Step S103).

When it is not determined that a predetermined condition is satisfiedand a frequent pattern needs to be removed (No at Step S104), thedetermination unit 155 determines a principal element in the basismatrix or the weighting matrix or both the matrices (Step S106). Forexample, the determination unit 155 calculates, for each ID included inthe pattern, the degree of importance so as to be higher as the value ofthe element of the ID becomes higher and be lower as the number offrequent patterns including the ID becomes larger, determines whetherthe degree of importance is equal to or higher than a predeterminedthreshold, and extracts a principal element in the basis matrix. Thedetermination unit 155 calculates a second threshold for each durationincluded in the pattern by using Otsu's method, determines whether thevalue of the element for each predetermined duration is equal to orlarger than the second threshold, and extracts a principal element inthe weighting matrix.

The significant log extraction unit 156 generates a significant log byextracting, from the classified text log 52, a record including only IDsof principal elements in the basis matrix extracted by the determinationunit 155.

The sequence extraction unit 157 extracts a sequence from thesignificant log generated by the significant log extraction unit 156(Step S107). For example, the sequence extraction unit 157 extracts, fora combination of IDs included in the pattern, a sequence the value ofthe element of which, that is, the number of appearances of which, isequal to or larger than a predetermined threshold and which satisfies apredetermined condition among sequences of the IDs.

Effects in First Embodiment

The classification unit 151 classifies messages included in a text logoutput from a system for each type, and gives an ID set for each type toeach of the classified messages. Based on the dates of occurrenceattached to messages, the creation unit 152 creates a matrix indicatingthe appearance distribution of the messages in the text log for eachpredetermined duration for each ID. The pattern extraction unit 153extracts a plurality of patterns as combinations of IDs from the matrixcreated by the creation unit 152. The removal unit 154 removes a part orwhole of the patterns from the matrix. The determination unit 155calculates the degree of importance for each element included in each ofthe patterns, and determines whether the degree of importance is equalto or higher than a predetermined threshold. The significant logextraction unit 156 and the sequence extraction unit 157 extractpredetermined information on principal elements from the classified textlog 52.

The analysis device 10 in the first embodiment creates a matrix based onthe appearance distribution of a plurality of messages, and can thusextract patterns based on the messages to perform monitoring taking therelation among the messages into consideration. Consequently, forexample, a series of predictive patterns and propagation patterns uponthe occurrence of a fault can be monitored and utilized for preventivemaintenance and cause estimation of faults. By performing the monitoringtaking the relation among messages into consideration, the occurrence ofa wolf alert, that is, an alert based on erroneous detection thatanomaly has occurred through no anomaly has actually occurred, can besuppressed.

The analysis device 10 in the first embodiment creates a matrix from theentire collected text log, and hence information on the entire text logcan be reflected to a matrix, and, for example, useful information canbe obtained from text logs that have been otherwise overlooked by simplemonitoring of individual messages.

By classifying a text log to form a matrix before calculation, a massiveamount of text logs can be efficiently analyzed. By removing frequentpatterns, a pattern that cannot be extracted as a frequent pattern dueto the presence of a pattern related to regular processing and a patternincluded in an error can now be extracted, and useful information foranomaly detection can be obtained. By determining whether an elementincluded in a frequent pattern is a principal element, an importantmessage can be extracted from a text log. By extracting informationrelated to the extracted important message, the calculation amount andprocessing time required for analysis of messages can be reduced.

For example, the determination unit 155 can calculate the degree ofimportance for each ID included in each of a plurality of patterns, anddetermine whether the degree of importance is equal to or higher than apredetermined threshold. In this case, the sequence extraction unit 157extracts a particular sequence from sequences indicating the order ofappearance of IDs that have been determined by the determination unit155 to have degrees of importance equal to or higher than apredetermined threshold. By extracting a sequence of principal messagesin this manner, the calculation amount and processing time forextracting a sequence can be reduced to facilitate analysis based on thesequence of messages.

The classification unit 151 classifies messages included in a text logoutput from a system for each type, and gives an ID set for each type toeach of the classified messages. Based on the dates of occurrenceattached to messages, the creation unit 152 creates a log matrix that isa matrix indicating the appearance distribution of the messages in thetext log for each ID for each predetermined duration. The patternextraction unit 153 decomposes the log matrix to extract a basis matrixwhose column vectors are patterns as combinations of IDs and a weightingmatrix whose row vectors indicate how frequently the pattern appears foreach predetermined duration. The removal unit 154 removes a part orwhole of the patterns from the log matrix.

The analysis device 10 in the first embodiment extracts a plurality ofrelevant messages as patterns, and hence an event occurring in thesystem can be easily specified. By removing patterns, a patternindicating a less frequent event such as a system fault can be extractedand utilized for monitoring. For example, an extracted fault predictivepattern can be monitored and utilized for preventive maintenance offaults, and a propagation pattern upon the occurrence of a fault can bespecified and utilized for cause estimation. By performing themonitoring taking the relation among messages into consideration, theoccurrence of a wolf alert, that is, an alert based on erroneousdetection that anomaly has occurred through no anomaly has actuallyoccurred, can be suppressed.

The analysis device 10 in the first embodiment classifies a text log,and can thus compress several thousands to several hundred millions ofmessage to types on the order of several hundreds to several thousands,which can be grasped by humans. A matrix is created from the entirecollected text logs, and hence information on a massive amount of logscan be reflected to one matrix, and, for example, useful information canbe obtained from text logs that have been otherwise overlooked by simplemonitoring of individual messages and been archived and not beenutilized.

The pattern extraction unit 153 may decompose a log matrix bynon-negative matrix factorization. In this manner, by classifying a textlog to form a matrix before calculation, a massive amount of text logscan be efficiently analyzed. By removing patterns, patterns having a lowfrequency of appearance, that is, a pattern that has not been extracteddue to the presence of a pattern related to regular processing and apattern included in an error upon the first pattern extraction can nowbe extracted, and hence useful information for anomaly detection can beobtained.

The removal unit 154 can remove a pattern by subtracting, from a logmatrix, any one of the product of a basis matrix and a significantweighting matrix obtained by replacing the values of elements smallerthan a predetermined threshold in each pattern in a weighting matrixwith 0, the product of a significant basis matrix obtained by replacingthe values of elements smaller than a predetermined threshold in eachpattern in a basis matrix with 0 and a weighting matrix, and the productof the significant basis matrix and the significant weighting matrix.The removal unit 154 can use any one of the product of a basis matrixand a non-significant weighting matrix obtained by replacing the valuesof elements equal to or larger than a predetermined threshold in eachpattern in a weighting matrix with 0, the product of a non-significantbasis matrix obtained by replacing the values of elements equal to orlarger than a predetermined threshold in each pattern in a basis matrixwith 0 and a weighting matrix, and the product of the non-significantbasis matrix and the non-significant weighting matrix as a matrixobtained by removing patterns from the log matrix. In this manner, theinfluence of patterns related to regular processing that has a certainlevel or more can be removed.

The pattern extraction unit 153 may further extract frequent patternsfrom a matrix from which frequent patterns have been removed by theremoval unit 154. In this manner, even when the influence of a patternrelated to regular processing has not been removed in the removal offrequent patterns once, the influence of the pattern related to regularprocessing can be further removed to extract a less frequent pattern.

The classification unit 151 classifies messages included in a text logoutput from a system for each type, and gives an ID set for each type toeach of the classified messages. Based on the dates of occurrenceattached to messages, the creation unit 152 creates a matrix indicatingthe frequency of appearance of the messages in the text log for each IDfor each predetermined duration. The pattern extraction unit 153extracts a combination of IDs whose frequencies of appearance ofmessages in the same duration are equal to or higher than apredetermined value from the matrix created by the creation unit 152 asa frequent pattern. The removal unit 154 removes the frequent patternfrom the matrix. The determination unit 155 determines, for each IDincluded in another frequent pattern extracted by the pattern extractionunit 153 from the matrix from which the frequent pattern has beenremoved by the removal unit 154, whether the frequency of appearance ina text log of a corresponding message satisfies a predeterminedcondition. The sequence extraction unit 157 extracts a particularsequence from sequences of IDs whose frequencies of appearance in thetext log have been determined by the determination unit 155 to satisfythe predetermined condition.

The analysis device 10 in the first embodiment creates a matrix based onthe frequencies of appearance of a plurality of messages, and can thusextract a pattern based on the messages to perform monitoring taking therelation among the messages into consideration. Consequently, forexample, a series of predictive patterns and propagation patterns uponthe occurrence of a fault can be monitored and utilized for preventivemaintenance and cause estimation of faults. By performing the monitoringtaking the relation among messages into consideration, the occurrence ofa wolf alert, that is, an alert based on erroneous detection thatanomaly has occurred through no anomaly has actually occurred, can besuppressed.

By determining whether an element included in a frequent patternextracted from a text log in the form of a matrix is a principalelement, an important message can be extracted from the text log.

The determination unit 155 may calculate a first threshold by usingOtsu's method based on the frequency of appearance of a message for eachID, and calculate the degree of importance for an ID whose frequency ofappearance of the message is equal to or larger than a first threshold.In this manner, load for calculating the degree of importance can bereduced.

The pattern extraction unit 153 may further extract the frequencies ofappearance of messages related to a combination for each predeterminedduration. In this case, the determination unit 155 may calculate asecond threshold by using Otsu's method based on the frequencies ofappearance of messages related to a combination for each predeterminedduration, and determine whether, for each predetermined duration, thefrequency of appearance of the message is equal to or higher than asecond threshold. In this manner, not only analysis based on thecontents of a message but also analysis based on the date of occurrenceof a message can be performed.

By classifying a text log to form a matrix before calculation, a massiveamount of text logs can be efficiently analyzed. By removing frequentpatterns, a pattern that cannot be extracted as a frequent pattern dueto the presence of a pattern related to regular processing and a patternincluded in an error can now be extracted, and useful information foranomaly detection can be obtained. By determining whether an elementincluded in a frequent pattern is a principal element, an importantmessage can be extracted from a text log. By extracting a sequence ofprincipal messages, the calculation amount and processing time forextracting a sequence can be reduced to facilitate analysis based on asequence of messages.

The classification unit 151 classifies messages included in a text logoutput from a system for each type, and gives an ID set for each type toeach of the classified messages. Based on the dates of occurrenceattached to messages, the creation unit 152 creates a matrix indicatingthe appearance distribution of the messages in the text log for each IDfor each predetermined duration. The pattern extraction unit 153extracts a plurality of patterns, which are combinations of IDs, fromthe matrix created by the creation unit 152. The determination unit 155calculates the degree of importance for each ID included in each of thepatterns, and determines whether the degree of importance is equal to orhigher than a predetermined threshold. The significant log extractionunit 156 generates a significant log by extracting, from a log obtainedby replacing each message in the text log with an ID given by theclassification unit 151, only an ID determined by the determination unitto be equal to or larger than a predetermined threshold. The sequenceextraction unit 157 counts, from the generated significant log, thenumber of appearances of each sequence indicating the order ofappearance of IDs having a high degree of importance, and extracts asequence the number of appearances of which is equal to or larger than apredetermined threshold and which satisfies a predetermined condition.

The analysis device 10 in the present embodiment creates a matrix basedon the frequencies of appearance of a plurality of messages, and canthus extract a pattern based on the messages, and perform the monitoringtaking the relation among messages into consideration. Consequently, forexample, a series of predictive patterns and propagation patterns uponthe occurrence of a fault can be monitored and utilized for preventivemaintenance and cause estimation of faults. By performing the monitoringtaking the relation among messages into consideration, the occurrence ofa wolf alert, that is, an alert based on erroneous detection thatanomaly has occurred through no anomaly has actually occurred, can besuppressed.

By extracting a sequence of principal messages included in a patternextracted from a text log in the form of a matrix, the calculationamount and processing time for extracting a sequence can be reduced tofacilitate analysis of messages based on the sequence.

The pattern extraction unit 153 may further extract the degree ofappearance of a pattern for each predetermined duration. In this case,the determination unit 155 calculates a second degree of importance of apattern for each predetermined duration, and further determines whetherthe second degree of importance is equal to or higher than apredetermined second threshold. The significant log extraction unit 156generates a significant log by extracting only a predetermined durationdetermined by the determination unit 155 to be equal to or larger thanthe predetermined second threshold.

In this manner, IDs related to events in the system have been specifiedfor each event to some degree at the time of the pattern extraction, andhence by generating a significant log for each pattern and extracting asequence therefrom, an event in the system can be easily interpretedfrom the extracted sequence.

Second Embodiment

The method of matrix decomposition in the present invention is notlimited to NMF described in the first embodiment. In the presentinvention, as the method of matrix decomposition, for example, methodsfor a matrix including values other than non-negative values such asprincipal component analysis and independent component analysis may beused. As a second embodiment, the case where matrix decomposition isperformed by using a method other than NMF is described.

In the second embodiment, the removal unit 154 uses a method such asprincipal component analysis or independent component analysis todecompose a log matrix into a basis matrix and a weighting matrix. Inthe second embodiment, the method for creating a significant basismatrix, a significant weighting matrix, a non-significant basis matrix,and a non-significant weighting matrix is different from that in thefirst embodiment. In the first embodiment, the removal unit 154 uses asignificant basis matrix obtained by replacing the values of elements inthe basis matrix smaller than a predetermined threshold with 0, and usesa significant weighting matrix obtained by replacing the values ofelements in the weighting matrix smaller than a predetermined thresholdwith 0.

In the second embodiment, on the other hand, the removal unit 154determines whether the absolute value of a value of an element is largerthan a threshold to determine whether to replace the value of theelement with 0. Specifically, the removal unit 154 uses a significantbasis matrix obtained by replacing the values of elements in the basismatrix whose absolute values are smaller than a predetermined thresholdwith 0, and uses a significant weighting matrix obtained by replacingthe values of elements in the weighting matrix whose absolute values aresmaller than a predetermined threshold with 0.

In the second embodiment, the removal unit 154 may determine whether toreplace the value of an element with 0 by using a positive thresholdwhen the value of the element is positive and by using a negativethreshold when the value of the element is negative. Specifically, theremoval unit 154 can use a significant basis matrix obtained byreplacing the values of elements in the basis matrix which are positiveand smaller than a positive threshold and the values of elements in thebasis matrix which are negative and larger than a negative thresholdwith 0, and use a significant weighting matrix obtained by replacing thevalues of elements in the weighting matrix which are positive andsmaller than a positive threshold and the values of elements in theweighting matrix which are negative and larger than a negative thresholdwith 0.

In the second embodiment, when creating a significant basis matrix and asignificant weighting matrix, the removal unit 154 replaces the valuesof elements that cannot be replaced with 0 with 0 to create anon-significant basis matrix and a non-significant weighting matrix.Specifically, the removal unit 154 can use a non-significant basismatrix obtained by replacing the values of elements in a basis matrixwhose absolute values are equal to or larger than a predeterminedthreshold with 0, and use a non-significant weighting matrix obtained byreplacing the values of elements in a weighting matrix whose absolutevalues are equal to or larger than a predetermined threshold with 0.

The removal unit 154 can use a non-significant basis matrix obtained byreplacing the values of elements in a basis matrix which are positiveand equal to or larger than a positive threshold and which are negativeand equal to or smaller than a negative threshold with 0, and use anon-significant weighting matrix obtained by replacing the values ofelements in a weighting matrix which are positive and equal to or largerthan a positive threshold and which are negative and equal to or largerthan a negative threshold with 0.

Third Embodiment

The analysis device of the present invention can decompose an inputmatrix having item indices as items in each row and having instanceindices as items in each column into the product of two matrices. Inthis case, the input matrix is not limited to a log matrix created basedon messages included in a text log output from a system.

For example, the input matrix may be a purchase log matrix indicatingthe quantity of products that have been purchased by each customer foreach product based on a purchase log having purchase informationindicating which product has been purchased by each customer. In thiscase, item indices in the input matrix are IDs that can identityproducts. Instance indices in the input matrix are IDs that can identifycustomers. The value of an element in the input matrix is purchaseinformation on a product. Examples of the purchase information include,but not limited to, the quantity of purchase and the value processed bytaking the logarithm of the quantity of purchase and thepresence/absence of purchase (values of 1 for purchase and 0 fornon-purchase). In the following embodiment, an example in which thequantity of purchase is used as purchase information is described.

In a third embodiment, the pattern extraction unit 153 extracts a basismatrix whose column vectors are a plurality of patterns as a combinationof item indices and a weighting matrix whose row vectors are each aweight in the instance indices in each pattern. The determination unit155 calculates the degree of importance for each item index included ina plurality of patterns, and determines whether the degree of importanceis equal to or higher than a predetermined threshold.

Processing in third embodiment Referring to FIG. 17, the flow ofprocessing by the analysis device according to the third embodiment isdescribed. FIG. 17 has been referred to for the description of the flowof the processing by the analysis device in the first embodiment. Theanalysis device in the third embodiment performs processing with thesame flow as that of the analysis device in the first embodiment, andhence FIG. 17 is also referred to for the following description.

First, the classification unit 151 classifies, for each product,purchase logs having information indicating which product has beenpurchased by each customer, and gives a product ID (Step S101). Based oninformation on customers in the purchase logs and customer IDs given bythe classification unit 151, the creation unit 152 creates a purchaselog matrix indicating the quantity of purchase of each product by eachcustomer, that is, an input matrix (Step S102).

In the third embodiment, the analysis device 10 may or may not create alog matrix indicated by Steps S101 and S102. In the case where theanalysis device 10 does not create a log matrix, an input matrix may beinput from the outside. In the following description, item indices asitems in each row of the purchase log matrix are product IDs, andinstance indices as items in each column are customer IDs.

Next, the pattern extraction unit 153 decomposes the purchase log matrixto extract a basis matrix and a weighting matrix (Step S103). Forexample, the pattern extraction unit 153 decomposes the purchase logmatrix by NMF, and extracts, as a pattern, a basis matrix indicating apattern as a combination of IDs of products that are purchased by manycustomers and a weighting matrix indicating a combination of IDs ofcustomers who purchase the products corresponding to the pattern.

In the third embodiment, the analysis device 10 may or may not remove afrequent pattern indicated by Steps S104 and S105. In the case where afrequent pattern is not removed, the analysis device 10 executes StepS103 and then executes Step S106 without executing Step S105 (No at StepS104).

Next, the determination unit 155 determines whether each element in thebasis matrix is a principal element (Step S106). In this case, thedetermination unit 155 calculates the degree of importance for each ofproduct IDs included in each of a plurality of patterns, and determineswhether the degree of importance is equal to or higher than apredetermined threshold. In the third embodiment, the analysis device 10may or may not extract a sequence indicated by Step S107.

For calculating the degree of importance and determining principalelements, the determination unit 155 can appropriately use the methodfor calculating the degree of importance based on TF-IDF and the methodfor calculating the threshold by Otsu's method independently or incombination similarly to the first embodiment.

For example, the determination unit 155 can use the value of an elementfor each product ID included in a pattern as the degree of importance,and use, as a threshold, a threshold calculated by using Otsu's methodbased on the value of the element for each product ID included in thepattern.

The determination unit 155 can calculate, for each product included inthe pattern, the degree of importance so as to be higher as the value ofthe element for each product becomes higher and be lower as the numberof patterns including the product becomes larger.

The determination unit 155 can calculate a first threshold by usingOtsu's method for the value of an element for each product ID in eachpattern, that is, all elements in each column vector of the basismatrix, and calculate the degree of importance for a product ID whosevalue of the element is equal to or higher than the first threshold.

The pattern extraction unit 153 may further extract an ID of a customerwho has purchased a product corresponding to the pattern. In this case,the determination unit 155 calculates a second degree of importance foreach customer ID in each pattern, that is, all elements in a row vectorof the weighting matrix, and further determines whether the seconddegree of importance is equal to or higher than the predetermined secondthreshold.

The determination unit 155 can use the value of an element for eachcustomer ID in the pattern as the second degree of importance, and use,as the second threshold, a threshold calculated by using Otsu's methodbased on the value of the element for each customer ID included in thepattern.

The determination unit 155 calculates the second degree of importance soas to be higher as the value of the element for each predeterminedcustomer ID in the pattern becomes higher and be lower as the number ofpatterns including the predetermined customer ID becomes larger.

The determination unit 155 calculates a third threshold by using Otsu'smethod based on the value of the element for each customer ID in eachpattern, and calculates the second degree of importance for a customerID whose value of the element is equal to or higher than a thirdthreshold.

Effects in Third Embodiment

The pattern extraction unit 153 extracts a basis matrix whose columnvectors are a plurality of patterns as a combination of item indices anda weighting matrix whose row vectors are each a weight of instanceindices in each pattern. The determination unit 155 calculates thedegree of importance for each item index included in each of thepatterns, and determines whether the degree of importance is equal to orhigher than a predetermined threshold. In this manner, the analysisdevice 10 in the third embodiment can efficiently extract an importantitem even when the size of an input matrix is very large.

In particular, when item indices of an input matrix are product IDs,instance indices are customer IDs, and the value of each element is thequantity of purchase, the analysis device 10 in the third embodiment canextract a pattern based on a plurality of products, and extract a groupof products that are highly possibly purchased by the same customer.Thus, for example, it can be known that, of a product A and a product Bthat are more likely to be purchased by the same customer, a customerwho has purchased only the product A will possibly purchase the productB as another product, and the product B can be recommended to thecustomer. Effects obtained when item indices of an input matrix areproduct IDs, instance indices are customer IDs, and the value of eachelement is the quantity of purchase are described below. According tothe present invention, the same effects can be obtained for any inputmatrix having item indices as items in each row and instance indices asitems in each column.

The determination unit 155 can use the value of an element for eachproduct ID included in a pattern as the degree of importance, and use,as a threshold, a threshold calculated by using Otsu's method based onthe value of the element for each product ID included in the pattern. Inthis manner, a product having a high degree of importance can beextracted.

The determination unit 155 calculates a second degree of importance soas to be higher as the value of an element for each predeterminedcustomer ID in the pattern becomes higher and be lower as the number ofpatterns including the predetermined customer ID becomes larger. In thismanner, a characteristic product can be extracted for each pattern.

The determination unit 155 can calculate a first threshold by usingOtsu's method for the value of an element for each product ID in eachpattern, that is, all elements in each column vector of the basismatrix, and calculate the degree of importance for a product ID whosevalue of the element is equal to or larger than the first threshold. Inthis manner, load for calculating the degree of importance can bereduced.

The pattern extraction unit 153 may further extract an ID of a customerwho has purchased a product corresponding to the pattern. In this case,the determination unit 155 calculates a second degree of importance foreach customer ID in each pattern, that is, all elements in a row vectorof the weighting matrix, and further determines whether the seconddegree of importance is equal to or higher than a predetermined secondthreshold. In this manner, a customer having a high degree of importancecan be extracted.

The determination unit 155 can use the value of an element for eachcustomer ID in the pattern as the second degree of importance, and use,as the second threshold, a threshold calculated by using Otsu's methodbased on the value of the element for each customer ID included in thepattern. In this manner, load for calculating the degree of importancecan be reduced.

The determination unit 155 calculates the second degree of importance soas to be higher as the value of the element for each predeterminedcustomer ID in the pattern becomes higher and be lower as the number ofpatterns including the predetermined customer ID becomes larger. In thismanner, a characteristic product can be extracted for each pattern.

The determination unit 155 calculates a third threshold by using Otsu'smethod based on the value of the element for each customer ID in eachpattern, and calculates a second degree of importance for a customer IDwhose value of the element is equal to or higher than the thirdthreshold. In this manner, load for calculating the degree of importancecan be reduced.

Other Embodiments

Dictionary information 142 created based on a text log 51 is not limitedto the one illustrated in FIG. 3. For example, as illustrated in FIG.18, a shorter character string may be used as a template. FIG. 18 is adiagram illustrating an example of a data configuration of dictionaryinformation according to another embodiment. When a message that doesnot match any template is included in a text log to be analyzed, theanalysis device 10 may add the message to the dictionary information 142as necessary.

System configurations, etc.

The components of the illustrated devices are conceptually illustrative,and are not necessarily required to be physically configured asillustrated. In other words, a specific mode for dispersion andintegration of the devices is not limited to the illustrated one, andall or part of the devices can be functionally or physically dispersedand integrated in any unit depending on various kinds of loads, usageconditions, and any other parameter. In addition, all or any part of theprocessing functions executed by the devices may be implemented by a CPUand computer programs analyzed and executed by the CPU, or implementedby hardware by wired logic.

Among the processing contents described in the above-mentionedembodiments, all or part of the processing that is described as beingautomatically executed can also be manually executed, or all or part ofthe processing that is described as being manually executed can also beautomatically executed by a known method. In addition, the processingprocedures, the control procedures, the specific names, and theinformation including various kinds of data and parameters describedherein and illustrated in the accompanying drawings can be freelychanged unless otherwise specified.

Computer Programs

In one embodiment, the analysis device 10 can be implemented byinstalling an analysis program for executing the above-mentionedanalysis on a desired computer as package software or online software.For example, by causing an information processing device to execute theabove-mentioned analysis program, the information processing device canbe caused to function as the analysis device 10. The informationprocessing device as used herein includes a desktop or notebook personalcomputer. In addition thereto, the category of the informationprocessing device includes mobile communication terminals such assmartphones, mobile phones, and personal handyphone systems (PHS) andslate terminals such as personal digital assistant (PDA).

The analysis device 10 can be implemented as an analysis server devicesuch that a terminal device used by a user is a client and servicerelated to the above-mentioned analysis is provided to the client. Forexample, the analysis server device is implemented as a server devicefor providing analysis service by inputting text logs and outputtingextracted IDs. In this case, the analysis server device may beimplemented as a Web server, or may be implemented as a cloud forproviding service related to the above-mentioned analysis byoutsourcing.

FIG. 19 is a diagram illustrating an example of a computer on which ananalysis device is implemented when a computer program is executed. Forexample, a computer 1000 includes a memory 1010 and a CPU 1020. Thecomputer 1000 includes a hard disk drive interface 1030, a disk driveinterface 1040, a serial port interface 1050, a video adapter 1060, anda network interface 1070. These units are connected by a bus 1080.

The memory 1010 includes a read only memory (ROM) 1011 and a RAM 1012.For example, the ROM 1011 stores therein a boot program such as a basicinput output system (BIOS). The hard disk drive interface 1030 isconnected to a hard disk drive 1090. The disk drive interface 1040 isconnected to a disk drive 1100. For example, a removable storage mediumsuch as a magnetic disk or an optical disc is inserted to the disk drive1100. For example, the serial port interface 1050 is connected to amouse 1110 and a keyboard 1120. For example, the video adapter 1060 isconnected to a display 1130.

For example, the hard disk drive 1090 stores an OS 1091, an applicationprogram 1092, a program module 1093, and program data 1094 therein. Thatis, computer programs defining the processing in the analysis device 10are implemented as the program module 1093 in which computer-executablecodes are written. For example, the program module 1093 is stored in thehard disk drive 1090. For example, the program module 1093 for executingthe same processing as the functional configuration in the analysisdevice 10 is stored in the hard disk drive 1090. Note that the hard diskdrive 1090 may be substituted by an SSD.

Setting data used for the processing in the above-mentioned embodimentis stored, for example, in the memory 1010 or the hard disk drive 1090as program data 1094. The CPU 1020 reads the program module 1093 or theprogram data 1094 stored in the memory 1010 or the hard disk drive 1090onto the RAM 1012 and executes the read program module or program dataas needed.

Note that the program module 1093 and the program data 1094 are notlimited to being stored in the hard disk drive 1090, and, for example,may be stored in a removable storage medium and read by the CPU 1020through the disk drive 1100. Alternatively, the program module 1093 andthe program data 1094 may be stored in another computer connectedthrough a network (such as local area network (LAN) and wide areanetwork (WAN)). The program module 1093 and the program data 1094 may beread by the CPU 1020 from another computer through the network interface1070.

REFERENCE SIGNS LIST

-   -   10 ANALYSIS DEVICE    -   11 COMMUNICATION UNIT    -   12 INPUT UNIT    -   13 OUTPUT UNIT    -   14 STORAGE UNIT    -   15 CONTROL UNIT    -   141 OUTPUT LOG INFORMATION    -   142 DICTIONARY INFORMATION    -   151 CLASSIFICATION UNIT    -   152 CREATION UNIT    -   153 PATTERN EXTRACTION UNIT    -   154 REMOVAL UNIT    -   155 DETERMINATION UNIT    -   156 SIGNIFICANT LOG EXTRACTION UNIT    -   157 SEQUENCE EXTRACTION UNIT

1. An analysis device, comprising: a memory; and a processor coupled tothe memory and programmed to execute a process comprising: classifyingmessages included in a text log output from a system for each type, andgiving an ID set for each type to each of the classified messages;creating, based on dates of occurrence attached to the messages, amatrix indicating an appearance distribution of the messages in the textlog for each predetermined duration for each ID; firstly extracting aplurality of patterns, which are combinations of the IDs, from thematrix created by the creating; removing a part or whole of the patternsfrom the matrix; calculating a degree of importance for each elementincluded in each of the patterns, and determining whether the degree ofimportance is equal to or higher than a predetermined threshold; andsecondly extracting, from the text log, predetermined information on anelement whose degree of importance has been determined by thedetermining to be equal to or higher than the predetermined threshold.2. The analysis device according to claim 1, wherein the calculatingcalculates the degree of importance for each ID included in each of thepatterns, and the determining determines whether the degree ofimportance is equal to or higher than a predetermined threshold, and thesecondly extracting extracts a particular sequence from sequencesindicating an order of appearance of IDs that have been determined bythe determining to have the degree of importance equal to or higher thanthe predetermined threshold.
 3. An analysis device for decomposing aninput matrix having item indices as items in each row and havinginstance indices as items in each column into a product of two matrices,the analysis device comprising: a memory; and a processor coupled to thememory and programmed to execute a process comprising: extracting abasis matrix in which column vectors are a plurality of patterns ascombinations of the item indices and a weighting matrix in whichweightings of the instance indices in each of the patterns are rowvectors; and calculating a degree of importance for each item indexincluded in each of the patterns, and determining whether the degree ofimportance is equal to or higher than a predetermined threshold.
 4. Theanalysis device according to claim 3, wherein the calculating uses avalue of an element for each item index in the basis matrix included inthe pattern as the degree of importance, and uses, as the threshold, athreshold calculated by using Otsu's method based on values of elementsfor each of all item indices in the basis matrix included in thepattern.
 5. The analysis device according to claim 3, wherein thecalculating calculates the degree of importance so as to be higher as avalue of an element for each item index in the basis matrix in thepattern becomes higher and be lower as number of the patterns includingitem indices in the basis matrix becomes larger.
 6. The analysis deviceaccording to claim 5, wherein the calculating calculates a firstthreshold by using Otsu's method based on the value of the element foreach item index in the basis matrix, and calculates the degree ofimportance for an item index whose value of the element is equal to orlarger than the first threshold.
 7. The analysis device according toclaim 3, wherein the extracting further extracts a value of an elementin the pattern for each instance index in a predetermined weightingmatrix, and the calculating calculates a second degree of importance ofthe pattern for each instance index in the predetermined weightingmatrix, and the determining further determines whether the second degreeof importance is equal to or higher than a predetermined secondthreshold.
 8. The analysis device according to claim 7, wherein thecalculating uses a value of an element in the pattern for each instanceindex in the predetermined weighting matrix as the second degree ofimportance, and uses, as the second threshold, a threshold calculated byusing Otsu's method based on the value of the element.
 9. The analysisdevice according to claim 7, wherein the calculating calculates thesecond degree of importance so as to be higher as a value of an elementof the pattern for each instance index in the predetermined weightingmatrix becomes higher and be lower as number of the patterns having avalue of an element of the instance index in the predetermined weightingmatrix becomes larger.
 10. The analysis device according to claim 9,wherein the calculating calculates a third threshold by using Otsu'smethod based on a value of an element for each instance index in thepredetermined weighting matrix, and calculates the second degree ofimportance for an instance index whose value of an element for eachinstance index in the predetermined weighting matrix is equal to orlarger than the third threshold.
 11. An analysis device, comprising: amemory; and a processor coupled to the memory and programmed to executea process comprising: classifying messages included in a text log outputfrom a system for each type, and giving an ID set for each type to eachof the classified messages; creating, based on dates of occurrenceattached to the messages, a matrix indicating an appearance distributionof the messages in the text log for each predetermined duration for eachID; firstly extracting a plurality of patterns, which are combinationsof the IDs, from the matrix created by the creating; calculating adegree of importance for each ID included in each of the patterns, anddetermining whether the degree of importance is equal to or higher thana predetermined threshold; generating a significant log by extracting,from a log obtained by replacing each message in the text log with an IDgiven by the giving, only an ID determined by the determining to beequal to or larger than the predetermined threshold; and counting, fromthe generated significant log, number of appearances of each sequenceindicating an order of appearance of IDs having a high degree ofimportance, and secondly extracting a sequence the number of appearancesof which is equal to or larger than a predetermined threshold and whichsatisfies a predetermined condition.
 12. The analysis device accordingto claim 11, wherein the firstly extracting further extracts a degree ofappearance of the pattern for each predetermined duration, thecalculating calculates a second degree of importance of the pattern foreach predetermined duration, and the determining further determineswhether the second degree of importance is equal to or higher than apredetermined second threshold, and the generating generates asignificant log by extracting only a predetermined duration determinedby the determining to be equal to or larger than the predeterminedsecond threshold.
 13. An analysis method to be executed by an analysisdevice, the analysis method comprising: classifying messages included ina text log output from a system for each type, and giving an ID set foreach type to each of the classified messages; creating, based on datesof occurrence attached to the messages, a matrix indicating anappearance distribution of the messages in the text log for eachpredetermined duration for each ID; extracting a plurality of patterns,which are combinations of the IDs, from the matrix created at thecreating; removing a part or whole of the patterns from the matrix;calculating a degree of importance for each element included in each ofthe patterns, and determining whether the degree of importance is equalto or higher than a predetermined threshold; and extracting, from thetext log, predetermined information on an element whose degree ofimportance has been determined at the determining to be equal to orhigher than the predetermined threshold.
 14. An analysis method to beexecuted by an analysis device configured to decompose an input matrixhaving item indices as items in each row and having instance indices asitems in each column into a product of two matrices, the analysis methodcomprising: extracting a basis matrix in which column vectors are aplurality of patterns as combinations of the item indices and aweighting matrix in which weighting of the instance indices in each ofthe patterns are row vectors; calculating a degree of importance foreach item index included in each of the patterns, and determiningwhether the degree of importance is equal to or higher than apredetermined threshold.
 15. An analysis method to be executed by ananalysis device, the analysis method comprising: classifying messagesincluded in a text log output from a system for each type, and giving anID set for each type to each of the classified messages; creating, basedon dates of occurrence attached to the messages, a matrix indicating anappearance distribution of the messages in the text log for eachpredetermined duration for each ID; extracting a plurality of patterns,which are combinations of the IDs, from the matrix created at thecreating; calculating a degree of importance for each ID included ineach of the patterns, and determining whether the degree of importanceis equal to or higher than a predetermined threshold; generating, from alog obtained by replacing each message in the text log with an ID givenat the classifying, a significant log by extracting only an IDdetermined at the determining to be equal to or larger than thepredetermined threshold; and counting, from the generated significantlog, number of appearances of each sequence indicating an order ofappearance of IDs having a high degree of importance, and extracting asequence the number of appearances of which is equal to or larger than apredetermined threshold and which satisfies a predetermined condition.16. A non-transitory computer-readable recording medium having storedtherein a program, for analysis, that causes a computer to execute aprocess comprising: classifying messages included in a text log outputfrom a system for each type, and giving an ID set for each type to eachof the classified messages; creating, based on dates of occurrenceattached to the messages, a matrix indicating an appearance distributionof the messages in the text log for each predetermined duration for eachID; extracting a plurality of patterns, which are combinations of theIDs, from the matrix created at the creating; removing a part or wholeof the patterns from the matrix; calculating a degree of importance foreach element included in each of the patterns, and determining whetherthe degree of importance is equal to or higher than a predeterminedthreshold; and extracting, from the text log, predetermined informationon an element whose degree of importance has been determined at thedetermining to be equal to or higher than the predetermined threshold.17. A non-transitory computer-readable recording medium having storedtherein a program, for analysis, that causes a computer configured todecompose an input matrix having item indices as items in each row andhaving instance indices as items in each column into a product of twomatrices to execute a process comprising: extracting a basis matrix inwhich column vectors are a plurality of patterns as combinations of theitem indices and a weighting matrix in which weighting of the instanceindices in each of the patterns are row vectors; calculating a degree ofimportance for each item index included in each of the patterns, anddetermining whether the degree of importance is equal to or higher thana predetermined threshold.
 18. A non-transitory computer-readablerecording medium having stored therein a program, for analysis, thatcauses a computer to execute a process comprising: classifying messagesincluded in a text log output from a system for each type, and giving anID set for each type to each of the classified messages; creating, basedon dates of occurrence attached to the messages, a matrix indicating anappearance distribution of the messages in the text log for eachpredetermined duration for each ID; extracting a plurality of patterns,which are combinations of the IDs, from the matrix created at thecreating; calculating a degree of importance for each ID included ineach of the patterns, and determining whether the degree of importanceis equal to or higher than a predetermined threshold; generating, from alog obtained by replacing each message in the text log with an ID givenat the classifying, a significant log by extracting only an IDdetermined at the determining to be equal to or larger than thepredetermined threshold; and counting, from the generated significantlog, number of appearances of each sequence indicating an order ofappearance of IDs having a high degree of importance, and extracting asequence the number of appearances of which is equal to or larger than apredetermined threshold and which satisfies a predetermined condition.